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1. Introduction 


Digital Elevation Models (DEM) are digital representations of the topographic information of 
a geographic area and are based on a high number of points defined by the X-, Y- and Z 
coordinates describing the Earth’s three dimensional shape. The term DEM is generic and 
includes two distinct products: Digital Terrain Models (DTM), describing the ground of the 
imaged area, and Digital Surface Models (DSM) describing the surface including the above 
ground objects elevation. DEMs may either be arranged regularly in a raster or in a random 
point cloud. These products have become a major component for an extensive number of 
remote sensing and environmental applications, such as mapping, orthoimage generation, 
virtual terrain reconstruction and visualization, simulations, civil engineering, land monitor- 
ing, forestry, flood planning, erosion control, agriculture, environmental planning, archaeol- 
ogy and others. Different techniques based on Earth imaging products belonging to different 
families are commonly used in order to obtain the elevation information. The most common 
techniques are based on Optical imagery, LiDAR (Light Radar) and Synthetic Aperture Radar 
(SAR). In this chapter the main focus will be on techniques exploiting Optical and SAR sensors: 
InSAR (Interferometric SAR) and Photogrammetry. The generic term DEM will be used, even 
if usually the data recorded with optical sensors refers to above ground objects (DSM), while 
the one obtained with SAR usually is a DSM/DTM composite. An overview and some results 
will be shown, outlining the main strengths and weaknesses of the approaches, thus leading 
to the fusion hypothesis in order to exploit the inter-platform complementarity and the 
intrinsic advantages of both techniques. Two different product levels (raster and point cloud) 
— on which the fusion approaches have to be built — are also proposed, as well as examples of 
such approaches. 
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2. DEM generation from Synthetic Aperture Radar sensors 


The generation of digital elevation models has been among the first techniques exploiting the 
phase information of SAR data. Being an active sensing technology, SAR offers day and night 
operability. However, one of the major drawbacks of this technology is posed by the temporal 
decorrelation between passes of SAR data. This drawback has been addressed mainly in two 
different ways. The first one is the minimization or suppression of the temporal separation 
between two or more acquisitions exploited in the DEM generation process. The second way 
relies on the selection of different frequencies/polarizations. 


The different approaches available for the spaceborne platform as well as the basic interfero- 
metric framework will be introduced in the following Sections. 


2.1. SAR interferometry 


Interferometry is among the main techniques applied to derive elevation data from SAR 
images exploiting their phase information, while radargrammetry [1], which has been 
developed in the eighties, exploits their intensity information (see the radargrammetry chapter 
for further information). In this section, a brief description of the InSAR technique, as well as 
an example of processing chain to generate INSAR DEMs, are given. A more in-depth overview 
of SAR interferometry can be found in [2]. 


SAR sensors belong to the so-called “active” family of instruments, in that they emit a wave 
which rebounds on the target and returns to the satellite where it is recorded. The wavelength 
of the outgoing signal is known as it is consequently its corresponding phase. This implies the 
possibility to compare the return phase with the reference one, thus allowing their coherence 
estimate. Ideally, the return phase should be a function of the satellite-target and back distance 
plus a phase difference which can be measured. In reality, however, several other factors affect 
the phase value, i.e. systematic effects which have to be considered in order to obtain useful 
information. 


In interferometry, two images covering the same area acquired from the same or slightly 
different position are used in order to obtain the topographic information. The phase difference 
between the two acquisitions is used to produce the interferogram. The difference is measured 
in radians and is recorded in fringes each one representing a 2 n cycle. The signal path length 
to the ground and back consists of a number of whole wavelengths plus some fraction of a 
wavelength, this means that the return wave phase is function of the distance between the 
target and the emitter/receiver. Said fraction can be seen as a phase shift in the returning wave. 
The total number of whole wavelengths is unknown, as it is the total distance to the satellite, 
but the phase shift defined by the extra fraction of a wavelength can be measured by the 
receiver with a high accuracy. 


The measured phase is influenced by different factors throughout the sensor-target-sensor 
path. One of the most important among them coming into play is the interaction with the 
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ground surface. Reflection, depending on the physical properties of the target material, may 
cause phase changes. As a consequence, the return signal for every pixel is the result of the 
summation of the sub-pixel interactions between the signal and a multitude of smaller targets 
included in the imaged ground area. These targets may show different physical and topo- 
graphic properties between one another, e.g. different dielectric properties, a different satellite- 
target distance or varying surface roughness. This leads to a signal which can be uncorrelated 
with the ones from the surrounding pixels and which is defined as “speckle”. Orbital effects, 
which are other important signal noise contributors, should be considered as well. The 
importance of these factors derives from geometrical constraints, since images from satellite 
platforms with different orbits cannot be compared. The spatial position from which the 
images are acquired should be as close as possible and the data comparable, hence they must 
come from the same orbital track of a single satellite platform. Their baseline difference, i.e. 
their perpendicular distance difference, is known within a few centimeters and causes a regular 
phase difference in the interferogram, which is modelled and removed. The position offset 
also implies a difference in the topographic effects on phase between the images. The topo- 
graphic height needed to produce a phase change fringe, known as altitude of ambiguity, 
becomes smaller when the baseline becomes longer. The latter aspect can be exploited in order 
to obtain the topographic height of the target, allowing the generation of a DEM. The topo- 
graphic phase contribution can be quantified and subtracted from the interferogram by means 
of a previously available coarser DEM in conjunction with the orbital information. 


Once these contributions have been identified and removed, the final interferogram contains 
the residual signal, despite the presence of remaining phase noise. The residual signal 
represents the phase change caused by the varying distance of the ground pixel from the 
satellite. In the interferogram, one phase difference fringe is recorded when the ground motion 
corresponds to half the radar wavelength. In the two-way travel distance this tallies with an 
increase of a whole wavelength. The absolute movement of a point can either be obtained by 
means of Ground Control Points (GCPs) recorded from GPS or similar instruments or by 
assuming that an area in the interferogram did not experience any deformation. 


In the following paragraphs, an example of processing chain used to produce interferograms 
and subsequently DEMs is outlined. First, the two-pass SAR images have to be focused, 
exploiting a phase preserving SAR focusing. Subsequently, the images have to be coregistered, 
to quantify their offset and geometric difference, thus ensuring their inter-comparability. A 
coarse and fine coregistration procedure exploiting a coarser DEM is used to define the 
parameters for the inter-image point alignment. The parameters are iteratively refined, 
resampling the slave image to match the ground area of the master image. A common Doppler 
bandwidth filtering is an additional step to apply before the interferogram generation in order 
to minimize the decorrelation which can be caused by the presence of an offset in the Doppler 
centroids. 


The Interferogram is then generated, a spectral shift filtering is performed and the Hermitian 
product is computed. The interferometric phase due to the curvature of the Earth is also 
removed by DEM flattening. In this process synthetic fringes are generated either from the 
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ellipsoidal height or from a coarser DEM exploiting a backward geocoding approach. These 
fringes are cross-multiplied by the SAR interferogram, allowing the removal of the wrapped 
phase low frequency components. After the phase flattening, the complex interferogram is put 
through a filtering process to improve its signal to noise ratio, which in turn will result in a 
higher height accuracy of the final product. The interferogram must be unwrapped, interpo- 
lating it over the whole 0 to 27 phase values to obtain a continuous deformation field. This has 
to be carried out in order to solve the 2m ambiguity induced by the cyclicity of the phase values 
in the interferogram. A common approach is to apply a region growing based approach where 
a coherence threshold is applied in the growing process. If the images are characterized by 
large areas of low coherence, limiting the growing ability, a Minimum Cost Flow approach 
can be applied. This approach considers a regular square grid over the whole image. All the 
pixels whose coherence is lower than a user defined threshold are masked out. A third 
approach is derived from the latter, here the Delaunay polygons are used only on those areas 
exceeding the aforementioned coherence threshold. As a result only the points showing good 
coherence are unwrapped, avoiding any influence from the less coherent ones. The use of the 
Delaunay triangular polygons also allows the exclusion of eventual irregular areas of small 
size showing poor coherence, thus avoiding the representation of phase jumps in the inter- 
ferogram. Eventual unwrapping errors can then be corrected in a following phase editing step, 
which can be both a semi-automatic or a fully manual procedure. 


The following step aims at optimizing the geometric characteristics of the orbits, in order to 
improve the phase to height conversion and to ensure a precise geo-location. With this purpose, 
ground control points (GCPs) have to be provided, introducing absolute location information. 
Several ways to collect GCPs are possible. The most precise one being the “in-situ” collection 
through GPS, although it is the most time consuming as well. Another option is to manually 
collect the GCPs on the image itself, obtaining the height from a reference DEM. This process 
can also be accomplished in an automatic fashion, with the software providing the user with 
a series of candidate pixels — ideally the best suited for the process. The synthetic phase, which 
has been subtracted in the flattening process, is then added back to the unwrapped phase. This 
ensures that the final DEM is obtained only from the SAR data. Ultimately, the DEM is obtained 
in the phase-to-map conversion, in which the relation between the phase and the topographic 
distance is exploited in order to obtain the elevation data composing the DEM. The final 
product is projected onto the desired cartographic reference system, taking into account all the 
geodetic and cartographic parameters needed. During this process the DEM precision can also 
be extrapolated keeping in mind that phase and topography are proportional, the theoretical 
elevation standard deviation can be derived from the interferometric phase dispersion. The 
latter can be obtained from a relation based on the signal wavelength, and represents the 
dispersion of the estimates around an expected value. This last aspect is of main importance 
to quantify the quality of a DEM, as well as providing useful information for further processing 
steps such as mosaicing or, as it will be explained in the following sections, in DEM fusion. An 
example of a DEM along with its respective precision map is shown in Figure 1. 
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Figure 1. Example of a TanDEM-X DEM (left) and the respective Precision map (right) over a region of Australia. 


2.2. Spaceborne InSAR DEM generation 


The ERS-1/2 (European Remote Sensing satellite 1 and 2) Tandem mission (1995, [3][2]) has 
been the first to give the ability to cover the same area with two instruments over 1 day. The 
two satellites shared the same orbital plane, satisfying the orbital requirements imposed by an 
interferometric process. Moreover, the 1 day coverage interval is a very valuable characteristic 
to generate DEMs, since it lessens the temporal decorrelation issue. A nation-wide operational 
example of a DEM produced using SARscape® is given in Figure 2, see [4] for a technical 
description. The DEM covers the whole surface of Switzerland, approximately 42’000 km?, 
with a 25 m horizontal resolution and height accuracies ranging from 7 to 15 metres, going 
from moderate to steep topography respectively. 


Figure 2. Example of a nation-wide ERS-1/2 DEM over Switzerland. 
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The decorellation issue of spaceborne SAR data has then been the main object of the Shuttle 
Radar Topography Mission [5]-[7]. The design of the instrument allowed the acquisition of 
space-based single-pass interferometric data, which is the best suited for InSAR DEM gener- 
ation. The mission was flown on board of the Space Shuttle in February 2000 and was designed 
to obtain elevation radar data on a near-global scale. The main objective was to generate the 
most complete high-resolution digital topographic database of planet Earth. The instrument's 
C-Band and X-Band radar sensors were used to cover most of Earth's (land) surface lying 
between latitude 60 degrees North and 56 degrees South, providing a coverage of some 80 per 
cent of Earth's land. 


The second version (V2) of the SRTM-C digital topographic data, also known as the finished 
version, has been released in recent years. This version resulted from an extensive editing effort 
led by the National Geospatial Intelligence Agency (NGA). Some of the most valuable features 
of this version are represented by the well-defined water bodies and coastlines, as well as the 
absence of single pixel errors such as spikes and wells. Note however that some areas of missing 
data (voids) can still be found. Moreover, in Version 2 directory one can also find a vector mask 
defining the coastline. The vector layer has been defined by NGA and is commonly known as 
the SRTM Water Body Data. The data is available at reproduction cost or as free download, 
and offers a 3 arc-seconds (about 90 m) spatial resolution. 


After the availability to the public of the first SRTM data, an important effort has been invested 
in its accuracy validation. The accuracy assessment methodologies and tests have at first been 
applied to limited areas, some examples can be found in [8]-[11]. Finally, the accuracy 
assessment of the whole dataset has been accomplished as well [12]-[15]. The complementarity 
between C and X has also been studied [16][17]. 


An important and peculiar issue related to the generation of the SRTM-C final DEM has been 
the filling of the voids plaguing the processed SAR data. The main areas in which this issue 
was present were characterized by severe topography or very low backscatter, characteristics 
renowned for their effect on SAR signal acquisitions. The filling subject has been covered in a 
number of publications, for example in [18]-[23], in which the authors proposed approaches 
either based on pure spatial interpolation or also incorporating the height information derived 
from other datasets. A first example of a completely filled SRTM dataset is the SRTM DEM 
Version 3, generated by CGIAR. A later version (Version 4), has also been released. In the latter, 
the voids have been further filled by including more auxiliary DEMs and SRTM30 data where 
available. Moreover, improved interpolation techniques [24] have been applied to enhance the 
product. These datasets are freely available online [25]. 


The SRTM dataset is definitely suitable as input layer in a vast number of applications for 
different topics and disciplines. The major contribution offered by SRTM can be resumed in 
the availability of a dataset with very good height accuracy, even considering its spatial 
resolution. Moreover, a good consistency of its characteristics over a very large area is 
available, allowing studies focussing on very large regions, where DEMs at higher resolution / 
height accuracy might also be obtained afterwards (in a preliminary study) or at the same time. 
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The application fields in which SRTM data can be used are disparate. Applications to geo- 
morphology [26][27], volcanology [28], hydrological studies [22],[29]-[31], forestry research 
[32]-[34], archaeology [35], glaciers volume monitoring [36], and many others have been 
proposed. 


As discussed, SRTM data offers a large coverage, but at the expense of spatial resolution. In 
order to improve the results obtained from the SRTM mission and to exploit the existing high- 
or very-high resolution SAR missions, the research has currently focused on the identification 
of satellite SAR missions to complement the existing ones with the main purpose of very high 
resolution DSMs. The main concept, introduced among others by [37][38], is the definition and 
assembly of satellite constellations composed by either passive or active SAR satellites orbiting 
around a master sensor. Single-pass capabilities are another main requirement that should be 
fulfilled. The concept has been studied in [39]-[43], finally resulting in the TanDEM-X (Terra- 
SAR-X add-on for Digital Elevation Measurement, [44]) mission, launched in 2010. The latter 
will work in conjunction with the TerraSAR-X mission, providing a second satellite orbiting 
around the first one. This type of pairing provides a continuous combination of along-and 
across-track baseline, suitable respectively for currents monitoring and for DEM generation. 
This satellite constellation will be used to produce the World(DEM™, a DTED (Digital Terrain 
Elevation Data) level 3 DSM [46], which will be available in 2014. Some initial examples of such 
very-high resolution DEMs, have been shown by [47] and [48]. One example of TanDEM-X 
product is shown in Figure 3. 


Figure 3. Example of a TanDEM-X DEM (right) compared to an SRTM-V4 DEM (left) over Australia. 


A similar concept has been developed for the COSMO-SkyMed mission (see [45]), also 
completed in 2010 with the launch of the fourth satellite of the constellation. The missions 
considered in these studies take into account high frequency (X-Band) sensors. A more in- 
depth example of a COSMO-SkyMed DEM compared to the SRTM-V4 equivalent is given in 
Figure 4. 


In Figure 4, the substraction of the two DEMs is given, showing the residuals in meters (mid- 
left image). This map and the corresponding normal distribution (mid-right image), show that 
on average the relative positioning between the two products is quite similar, with the majority 
of the observations being between the -12/12 meters range. However, the COSMO-SkyMed 
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Figure 4. Comparison between an SRTM-V4 DEM (top-left) and a COSMO-SkyMed DEM (top-right). Height residual in 
meters between the DEMs (mid-left) and corresponding residual distribution (mid-right). Elevation profile over a test 
zone of SRTM-V4 (bottom-left) versus COSMO-SkyMed (bottom-right), the test profiles were collected over the input 
DEMs as shown by the red line in the top images. Image collected over the Malawi region. Image courtesy of A.S.I. 
Agenzia Spaziale Italiana. 


DEM has a much higher spatial resolution and altimetric precision than the SRTM data, 
allowing for a better slope definition as shown by the profiles at the bottom of Figure 4. In the 
SRTM data one can clearly see that the altimetric and geometric detail is lost. The better slope 
representation of COSMO-SkyMed also allows a better absolute radiometric correction of 
intensity data, while, having such a higher three dimensional spatial definition, the data is 
particularly suited in geo-information processing. 
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An alternative to these solutions is the exploitation of ALOS PALSAR L-Band data, if one looks 
at stable scattering mechanisms at lower frequencies. The lower frequency of the PALSAR 
system offers higher penetration in vegetated areas, resulting in a higher temporal correlation. 
This results in a lower phase noise compared to C-Band and X-Band systems [50]. In Figure 
5, an example of PALSAR DEM is given in comparison with the respective SRTM counterpart. 
A better slope definition is appreciable with ALOS-PALSAR as well, as shown by the profiles 
at the bottom of Figure 5. 
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Figure 5. Comparison between SRTM-V4 DEM (top-left) and ALOS-PALSAR DEM (top-right) over the Malawi region. 
Elevation profile over a test zone of SRTM-V4 (bottom-left) versus ALOS-PALSAR (bottom-right), the test profiles were 
collected over the input DEMs as shown by the red line in the top images 


3. DEM generation from spaceborne optical sensors 


3.1. VHR optical sensors 


Today an increasing number of Earth-observation platforms are equipped with Very High- 
Resolution (VHR) optical sensors characterized by a ground resolution of less than 1m [51], 
enabling the discrimination of fine details, like buildings and individual trees. The radiometric 
and geometric quality of the satellite images can be compared with original digital aerial 
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images. The image orientation has been simplified by using rational polynomial functions [52] 
and the direct sensor orientation has been improved, allowing, in some cases, the image 
processing and DEM generation without ground control information [53]-[54]. Complement- 
ing the high spatial resolution, VHR sensors are mounted on highly agile platforms, which 
enable rapid targeting, a revisit time up to 1 day and the stereoscopic coverage within the orbit 
for 3D information recovery and DEM generation. The cost of the stereo acquisition (two stereo 
images) is in general equal to the double of a single acquisition. The availability of stereo 
acquisitions in the archives of image distributors is much lower than the single acquisition 
mode, but stereo acquisition can be planned on demand. 


3.2. Image acquisition 


Earth observation optical sensors mounted on satellites acquire mainly in pushbroom mode. 
The imaging system of a pushbroom sensor consists of an optical instrument (usually a lens 
or a telescope) and Charge Coupled Devices (CCD) lines assembled in the focal plane. While 
the platform moves along its trajectory, successive lines are acquired perpendicular to the 
satellite track and stored in sequence to form a strip. In case of multispectral sensors, a strip is 
generated for each channel. In most cases the sensors are placed along a single line or in two 
or more segments staggered along the flight direction (i.e. EROS-B) or butted with some 
overlap (i.e. Quickbird) to increase the image ground resolution through a specific geometric 
and radiometric post-processing. 


In order to produce DEMs, stereoscopic coverage is mandatory. Amongst satellites using a 
stereo acquisition mode, the following can be distinguished: (1) standard across-track systems, 
(2) standard simultaneous multi-view along-track systems, and (3) agile single-lens systems. 


In the standard across-track configuration, the CCD lines and one optical system are generally 
combined with a mirror that rotates from one side of the sensor to the other across the flight 
direction, up to 30°, with the overlapping area across the flight direction. According to this 
configuration, stereo images are collected from different orbits at different dates, with temporal 
variation between the images. Examples are very popular high resolution (HR) satellite 
sensors, like SPOT-1-4 HRV and SPOT-5 HRG by CNES, IRS-1C/1D PAN by ISRO, Beijing-1 
CMT by the People Republic of China and Kompsat-1 EOC by Korea Aerospace Research 
Institute (KARI). In the standard along-track configuration, two or more strips are taken 
simultaneously from the same orbit at different angles along the flight direction. For each 
viewing direction there are one lens and one set of CCD lines placed on the focal plane. The 
along-track angles are generally fixed and the base-over-height (B/H) ratio constant for each 
satellite. This same-date along-track configuration thus minimizes the temporal variation 
between the images. Examples of HR sensors with this configuration are SPOT-5 HRS by 
CNES, ALOS-PRISM by JAXA, Cartosat-1 by ISRO and ZY-3 by the People Republic of China. 


The last generation VHR sensors use an agile configuration that was first introduced on 
IKONOS-2 in 1999. These sensors are carried on small and flexible satellites flying along sun- 
synchronous and quasi-polar orbits and use a single-lens optical system [59]. For stereo 
viewing or frequent temporal coverage, they have the ability to rotate on command around 
their cameras axes and view the target from different directions, in forward mode (from North 
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to South) or reverse mode (from South to North). Therefore, along-track and across-track stereo 
pairs of a particular area of interest are planned in advance and acquired on almost any part 
of the Earth’s surface. The general limit for off-nadir angles is 45°, but larger values are used 
in some specific situations. Some agile sensors have a synchronous acquisition mode, thus the 
satellite speed and the scanning speed are equal, and the viewing angle is constant during one 
image acquisition. Examples are IKONOS-2, Kompsat-2 and Formosat-2 RSI by National Space 
Program Office of China (NSPO). On the other hand more agile sensors, including Quickbird, 
WorldView 1 and World View-2 (DigitalGlobe), GeoEye-1 (GeoEye), EROS-A and -B (Image- 
Sat International), Orbview-3 (Orbimage) and TopSat (QinetiQ), Pléiades 1A and 1B (Astrium) 
scan the Earth in asynchronous mode: the platform velocity is higher than the scanning one, 
therefore the satellite rotates continuously during the acquisition and a CCD line scans longer 
a line on the ground. The limitation of this scheme is that the geometry is less stable. The success 
of agile single-lens systems for the acquisition of VHR stereo imagery is confirmed by its 
planned use in future missions too. GeoEye-2, planned for launch in 2013, and Worldview-3, 
planned for 2014, will have the stereo capability of their processors GeoEye-1 and WorldView-2 
respectively. 


Images acquired by VHR sensors are distributed with a certain level of processing; unfortu- 
nately, the range of terminology used to denominate the same type of image data is different 
at each data provider. In general, three main processing levels can be distinguished: a) raw 
images with only normalization and calibration of the detectors without any geometric 
correction (this level is preferred by photogrammetrists working with 3D physical models); b) 
geo-referenced images corrected for systematic distortions due to the sensor, the platform and 
the Earth rotation and curvature; c) map-oriented images, also called geocoded images, 
corrected for the same distortions as geo-referenced images and North oriented. Generally, 
very few metadata related to sensor and satellites are provided; most of metadata are related 
to the geometric processing and the ellipsoid/map characteristics. 


3.3. Image processing 


The standard photogrammetric workflow for digital surface modeling and 3D information 
extraction from stereo optical images is summarized in Figure 6. Two major steps, the image 
orientation and image matching for DEM generation, are discussed in the following sections. 


Image Project setup Image orientation Image matching DSM generation Other products 
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Figure 6. Photogrammetric workflow for DEM and 3D information extraction from stereo imagery. 
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4. Image orientation 


For DEM generation, the orientation of the images is a fundamental step and its accuracy is a 
crucial issue during the evaluation of the entire production pipeline. Over the last three 
decades various models based on different approaches have been developed [57]. Rigorous 
models try to describe the physical properties of the image acquisition through the relationship 
between the image and the ground coordinate system [58]. In case of images acquired by 
pushbroom sensors, each image line is the result of a perspective projection. The mathematical 
model is based on the collinearity equations, which relate the camera system (right-hand 3D 
system centered on the instantaneous perspective center, with the y axis parallel to the image 
line and the x axis perpendicular to the y axis, closely aligned with the flight direction) to a 3D 
local coordinate system. The collinearity equations are extended to include exterior and 
interior orientation modeling of pushbroom sensors. Usually ephemeris information is not 
precise enough for accurate mapping, and the exterior orientation has to be further improved 
with suitable time-dependent functions and estimated in a bundle adjustment. Simple third- 
order Lagrange polynomials [60][61], quadratic functions [62] and piecewise quadratic 
polynomials [56] have been proposed for this purpose. In [63] a study of anumber of trajectory 
models is given. For initial approximation of the modeling parameters, the physical properties 
of the satellite orbit and the ephemeris are used. With respect to the interior orientation, 
suitable parameters are added to model the optical design (number of lenses, viewing angles), 
the lens distortions and the CCD line distortions. A detailed description of the errors can be 
found in [64]. 


In recent years, 3D rational functions (RFs) have become the standard form to approximate 
rigorous physical models of VHR sensors. They describe the relation between normalized 
image and object coordinates through ratios of polynomials, usually of the third order. The 
corresponding polynomial coefficients, together with scale and offset coefficients for coordi- 
nate normalisation, form the so-called rational polynomial coefficients (RPCs) and are 
distributed by image vendors as metadata information. Anyway to obtain a sensor orientation 
with sub-pixel accuracy, the RPCs need to be refined with linear equations requesting more 
accurate GCPs or, more commonly, with 2D polynomial functions ([65], [66] and others). For 
the latter solution, one or two GCPs are used for zero-order 2D polynomial functions (bi- 
directional shift) and six to ten GCPs for first and second-order 2D polynomial functions to 
compute their parameters with a least squares adjustment process. In the case of stereo-images, 
a block adjustment with RPC is applied [66] for the image orientation. RPCs are widely adopted 
by image vendors and government agencies, and by almost all commercial photogrammetric 
workstation suppliers. 


4.1. Image matching 


Image matching is referred to the establishment of correspondences between primitives 
extracted from two (stereo) or more (multi-view) images. Once the image coordinates of the 
correspondent points are known in two or more images, the object 3D coordinates are 
estimated via collinearity or projective model. In image space this process produce a depth 
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map (that assigns relative depths to each pixel of an image) while in object space we normally 
call it point cloud. For more than 50 years, image matching has been an issue of research, 
development and practical implementation in software systems. Gruen [67] reports the 
development of image matching techniques over the past 50 years while in [68] more critical 
analyses and examples are reported. 


Today many approaches for image matching exist. One first classification is based on the used 
primitives, which are then transformed into 3D information. According to these primitives, 
the matching algorithms can be area-based matching (ABM) or feature-based matching (FBM) [69]. 
ABM, especially the LSM method with its sub-pixel capability, has a very high accuracy 
potential (up to 1/50 pixel) if well textured image patches are used. Disadvantages of ABM are 
the need for small searching range for successful matching, the large data volume which must 
be handled and, normally, the requirement of good initial values for the unknown. Problems 
occur in areas with occlusions, lack of or repetitive texture and if the surface does not corre- 
spond to the assumed model (e.g. planarity of the matched local surface patch). FBM is often 
used as alternative or combined with ABM. FBM techniques are more flexible with respect to 
surface discontinuities, less sensitive to image noise and require less approximate values. 
Because of the sparse and irregularly distributed nature of the extracted features, the matching 
results are in general sparse point clouds which are then used as seeds to grow additional 
matches. 


Another possible way to distinguish image matching algorithms is based on the created point 
clouds, i.e. sparse or dense reconstructions. Sparse correspondences were the initial stages of 
the matching developments due to computational resource limitations but also for a desire to 
reconstruct scenes using only few sparse 3D points (e.g. corners). Nowadays all the algorithms 
focus on dense reconstructions-using stereo or multi-view approaches. A dense matching 
algorithm should be able to extract 3D points with a sufficient resolution to describe the object’s 
surface and its discontinuities. Two critical issues should be considered for an optimal 
approach: (i) the point resolution must be adaptively tuned to preserve edges and to avoid too 
many points in flat areas; (ii) the reconstruction must be guaranteed also in regions with poor 
textures or illumination and scale changes. 


A rough surface model of the terrain is often required in order to initialize the matching 
procedure. Such models can be derived in different ways, e.g. by using a point cloud interpo- 
lated from the tie points measured at the orientation stage, from already existing terrain 
models, or from a global surface model generated with 3-second DEM by Shuttle Radar 
Topography Mission (SRTM). The matching procedures for terrain modelling are generally 
pyramid-based, that is, they start from a high image pyramid level, and at each pyramid level 
matching is performed; the results at one level are used to update the terrain range at the next 
lower pyramid level. In this way, the ambiguity of terrain variation is reduced at each pyramid 
level and search range are reduced as well. The algorithm convergence is a function of terrain 
slope, accuracy, and pixel size. 


As dense matching is a task involving a large computing effort, the use of advanced techniques 
like parallel computing and implementation at GPU / FPGA level can reduce this effort and 
allow real-time depth map production. 
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The accuracy and reliability of the derived 3D coordinates rely on the accuracy of the sensor 
calibration and image orientation, the accuracy and number of the image observations and the 
imaging geometry (i.e. the effects of camera optics, image overlap and the distance camera- 
object). A successful image matcher should also employ strategies to monitor the matching 
results and quality with statistic parameters. The weaknesses of the image matching technique 
for DEM generation is the requirement of good texture in the images, the absence of dark and 
shadowed areas and the relative small baseline over flying height (B/D) ratio in order to have 
very small perspective distortion in the images. 


5. DEM Quality assessment 


The following evaluation is the result of two studies conducted on DEM estimation from VHR 
optical sensors from spaceborne platforms, and deeply described in [55], [70] and [71]. In these 
studies stereo scenes from three latest VHR resolution sensors were taken into account: 
GeoEye-1 (GE1, launched in 2011, WorldView-2 (WV2, launched in 2012) and Pleiades-1A 
(PL1, launched in 2012). 


In the first study DEMs were generated from stereo images acquired by GE1 on Dakar (Senegal) 
and Guatemala City (Guatemala) and by WV2 on Panama City, Constitucion (Chile), Kabul 
(Afghanistan), Teheran (Iran), Kathmandu (Nepal) and San Salvador (El Salvador). The aim 
of this work was to evaluate the potential of VHR satellite images for the modeling of very 
large urban areas for the extraction of value-added products for hazard, risk and damage 
assessment. 


The main characteristics of the images and areas are briefly reported. Due to the large extent 
of the cities (up to 1’500 km’), the datasets generally consist of multiple pairs (or couples) of 
stereo images acquired by the same sensor and cut in tiles. If the stereo pairs are acquired in 
the same day, the time difference between their acquisitions is less than 1 hour. In case of 
multiple dates, differences are up to 3 months (i.e. Dakar). The viewing angles and conse- 
quently the convergence angle and B/H ratio are not the same for all stereo pairs. Different 
situations occur: a) one quasi-nadir image (acquisition angle close to the vertical) and one off- 
nadir image (backward or forward viewing); b) one backward and one forward image with 
symmetric angles, and c) one backward and one forward image with asymmetric angles and 
large convergence angle. Large viewing angles determine the presence of occlusions, mainly 
in urban areas, larger shadows, and larger GSD, with respect to quasi-nadir acquisitions. Areas 
like San Salvador and Kabul, smaller than 15-17 km in width, were scanned in one path. In 
case of larger areas the images were acquired in the same day from two (Panama City, 
Guatemala City, Constitucion) or three (Teheran or Kathmandu) different paths, or in different 
days (Dakar). In each path the acquisition angles are almost constant, but different between 
paths. This might cause differences in the DEM on the overlapping areas between paths, as 
the sensor performance for DEM generation depends on the B/H ratio, and consequently on 
the incidence angles of the stereo images. 
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GE1 stereo images were provided at geo-referenced processing level, while the WV2 ones were 
provided in raw level. In all cases, the rational polynomial coefficients (RPC or RPB formats) 
available for each image or tile were used as geo-location information for the geometric 
processing. 


In general the geo-location accuracy of the RPC depends on the image processing level, the 
terrain slope and the acquisition viewing angles ([57],[58]). In flat areas like in Panama City 
(level 2A,-16° and 16° viewing angles) the relative accuracy of the RPC between two images 
composing a stereo pair is approximately 30m, while in the mountain area around Kathmandu 
(level 1B,-31° and 15° viewing angles) the relative accuracy between the stereo images reaches 
300m. 


The datasets mainly cover inhomogeneous urban areas with different layout: dense areas with 
small buildings, downtown areas with skyscrapers, residential areas, industrial areas, forests, 
open areas and water (sea, lakes, and rivers). In addition images include rural hilly and 
mountain areas surrounding the cities, with important height ranges: 2’400m in case of 
Kathmandu, almost 2’300m in case of Teheran, 1’100m in case of Guatemala City, 1’000m in 
case of Kabul. Some regions were not visible in the images because occluded by clouds or very 
dark cloud shadows, as in Panama City and Kathmandu cities. 


The processing for DEM generation was applied separately to each dataset. On the orientation 
point of view, the geometric model for spaceborne pushbroom sensors based on RPC was used. 


Ground points were not available, so at least the relative orientation of the images was 
guaranteed. Common tie points in two or more images were measured homogeneously in the 
images in order to ensure the relative orientation between the two images of the same stereo 
pair and between different stereo pairs that overlaps along or across the flight direction, and 
get a stable block. A minimum of 5 tie points for each pair were measured manually by an 
operator on well-defined and fixed/stable features on the terrain (i.e. crossing lines, road signs). 
Taking into account the ground spatial resolution of the input images, the DEMs were 
generated with 2m grid spacing (about 4 times the pixel size of the panchromatic channels). 
In case of projects with overlapping stereo pairs (i.e. Teheran, Kathmandu, Dakar, etc.), the 
DEM was computed separately for each stereo pair and then the single DEMs were merged 
using linear interpolation. 


The second work on DEM quality assessment was carried out on a testfield with accurate 
ground reference data [70]. The testfield is located in Trento, in the Northeast of Italy, and 
varies from urban areas with residential, industrial and commercial buildings at different sizes 
and heights, to agricultural or forested areas, and rocky steep surfaces, offering therefore a 
heterogeneous landscape in term of geometrical complexity, land use and cover. The height 
ranges from 200m to 2100m. The testfield includes several heterogeneous datasets at varying 
spatial resolution, ranging from satellite imagery to aerial imagery, LiDAR data, GNSS 
surveyed points, GIS data, etc. For the scope of this Chapter, the stereo images from VHR 
satellite sensors used for DEM quality assessment are: 


a. A WorldView-2 (WV2) stereo-pair, acquired on August 22, 2010. The first image was 
recorded in forward scanning mode with an average in-track viewing angle of 15.9°, while 
the second one was acquired in reverse scanning mode with an average in-track viewing 
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angle of-14.0°. The processing level is Stereo 1B, i.e. the images are radiometrically and 
sensor corrected, but not projected to a plane using a map projection or datum, thus 
keeping the original acquisition geometry. The available channels are the panchromatic 
one and eight multispectral ones. The images cover an area of 17.64«17.64 km. The images 
were provided with Rational Polynomial Coefficients (RPCs). 


b. A GeoEye-1 (GE1) stereo-pair, acquired on September 28, 2011. Both images were 
recorded in reverse scanning position, with a nominal in-track viewing angle of about 15° 
in forward direction and-20° in backward direction. The stereo images are provided as 
GeoStereo product, that is, they are projected to a constant base elevation. The available 
bands are the panchromatic one and four multispectral bands (blue, green, red, and near 
infrared). The images cover an area of 10x10 km. For each image the RPCs were provided. 


c. Atriplet from Pléiades-1A (PL1) sensor, acquired on August 28, 2012. The average viewing 
angles of the three images are, respectively, 18°,-13° and 13° in along-track direction with 
respect to the nadir and close to zero in across-track direction, while their mean GSD varies 
between 0.72m and 0.78m, depending on the viewing direction. The Pleiades images were 
provided at raw processing level called “Primary”, that is, with basic radiometric and 
geometric processing. This product is indicated for investigations and production of 3D 
value-added products [72]. The three images cover an area of about 392km. 


For the geometric processing of the stereo scenes, the commercial software SAT-PP (SATellite 
image Precision Processing) by 4DiXplorer AG [73] was used. Information about the software 
functionalities and the approaches for image orientation and DEM generation is given in [74]. 
The images were oriented based on the RPC-based model, by estimating the parameters 
modelling an affine transformation to remove systematic errors in the given RPCs. For this 
operation, a selection of available ground points visible in both images was used. A sub-pixel 
accuracy was reached in the orientation both for WV2 and GE1 stereo-pairs and PL1 triplet. 
The DEMs were generated with a multi-image least-square matching, as described in [74], 
using a grid space equal to 2 times the GSD, which leads to 1 m geometric resolution surface 
models. Few seed points were manually measured in the stereo images in correspondence of 
height discontinuities to approximate the surface. The DEMs were neither manually edited 
nor filtered after their generation. Table 1 shows the DEM analysis in three test areas: 1. Trento 
city center, characterized by small and buildings closed to each other, 2. Trento train station, 
with large flat buildings and open areas and 3. Fersina residential area, with separated 
buildings and open areas. The reference LiDAR DEM are shown together with the error maps 
of GE1, WV2 and PL1 DSMs. In the error maps the height differences in the range [-1 m, 1 m] 
were plotted in white, as they are within the intrinsic precision of the sensors, while large 
positive and negative errors were highlighted in red (LiDAR above the DEM) and blue (DEM 
above LiDAR), respectively. 


By visual analysis of the DEMs, it was observed that in all datasets the surface shape was well 
modelled with both sensors (GE1 with processing level 2A and WV2 with processing levels 
1B and 2A). In mountain areas with large elevation difference, like Trento, Teheran, Kabul and 
Guatemala City, the shape of valleys and mountain sides and ridges is well modeled in the 
DEM (Figure 7). In comparison to SRTM, using VHR images it is possible to extract finer DSM 
and to filter the Digital Terrain Model (DTM) at higher grid space. This is confirmed by the 
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comparison of the height profiles of the WV2 DTM and the SRTM in Teheran (Figure 8). In 
rural areas, cultivated parcels can be distinguished, together with paths and lines of bushes 
and trees along their sides. In case of forest, the DEM clearly shows a different height with 
respect to adjacent cultivated areas or grass. It is even possible to distinguish roads and rivers 
crossing forests (Figure 9). In general rural areas are well modelled both in mountain, hilly 
and flat terrain. 


In urban areas building agglomerations, blocks with different heights, the road network, some 
infrastructures (i.e. stadium, bridges, etc.) and rivers are generally well outlined both in flat 
and hilly terrains (Figure 10, Table 1). In residential and industrial areas it is possible to 
distinguish single buildings and lines of trees (Figure 11, Table 1). In some cases the roof 
structures of large and complex buildings are modeled (Figure 12). Errors are encountered 
between buildings, as narrow streets are not visible in the stereo-pairs due to shadows or 
occlusions. In these cases the DEMs overestimates the height of the terrain, as they do not 
model the street level (Table 1). In case of Trento dataset (Table 1), the two large red spots 
(highlighted in red and blue in the LiDAR DEM) occur on churches, in all three DEMs. The 
matching failure is likely due to the homogeneous material used for the roof cover, the 
shadowing, and, eventually, the structure geometry. The manual measurements of seed points 
on the two buildings would certainly help the matching procedure. In the train station test 
area, some differences are due to the presence of trees (blue circle, LIDAR is below the DEMs) 
and to a change in the area (red circle, LiDAR is above the DEMs), that is, a building was 
demolished between LiDAR and WV2/GE1/PL1 acquisitions. Regarding the third test area 
(Trento Fersina), significant height differences occur in correspondence of trees (blue areas) 
and between tall buildings. 


By comparing the statistics of the error maps, in general there is a good agreement between 
the three surface models generated by PL1, WV2 and GE1 and the values in the statistics 
confirm the above analysis. The three surface models give similar results in terms of minimum 
and maximum values, mean value and RMSE, being Pleiades slightly better than the other 
two. 


From the analyses above reported, it can be concluded that failures in surface modelling from 
VHR optical sensors can be caused by a number of factors, which are summarized in Table 2. 


The image absolute geolocation accuracy, which depends on the viewing angle, on the 
processing level and on terrain morphology, influence the estimation of the height in object 
space, and therefore the quality of the absolute geolocation of the final DEM and related 
products (DTM, orthophotos, 3D objects). The measurement of accurate and well distributed 
ground control points (GCPs) in the images would solve the problem, but this information is 
often not available. The accuracy of the relative orientation between two images forming a pair 
is crucial for the epipolar geometry and image matching, while the relative accuracy between 
overlapping stereo pairs is responsible of height steps in the final DEM (Figure 13). The size 
of the height steps depends on the accuracy of the geometric orientation of each stereopair, 
and generally shows a systematic behavior. In both cases the relative orientation can be 
improved by manually measuring a sufficient number of common tie points between the 
images. 
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Figure 7. Trento. DEM from the GE1 stereo-pair; visualization in colour-shaded mode in SAT-PP software. 
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Figure 8. Teheran. Height profile in WV2 DTM (blue) and SRTM (red) and zoom in the corresponding surface models 
(above: SRTM, below: WV2 DTM). 
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Figure 9. Left: DEM of Rural area in Constitucion. Right: Original image (panchromatic) 
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Figure 10. DEM of dense urban area on flat and hilly terrain in Dakar. 
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Figure 11. DEM of residential and rural area on flat terrain in Panama City. The black oval highlights a line of tree. 


Figure 12. Left: DEM of Panama City Airport. Right: original image (pan sharpened). 
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Error map of PL1 triplet DEM 
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o: 0.06 RMSE: 6.12 g: 0.22 RMSE: 6.52 o: 0.25 RMSE: 6.73 


Table 1. DEM analysis on Trento testfield. Quality evaluation of WV2, GE1 and PL1 DEMs with respect to the LIDAR 
DEM: orthophoto with test areas contour (light blue) and profile transect (yellow), error planimetric distribution, 
statistics (minimum m, maximum M, mean u, standard deviation o, RMSE) and height profiles. Measures are in meters. 


Low-textured and homogenous areas origin blunders in the DEM, as the automatic matching 
of the homologous points fails. This is typical in homogeneous land cover (i.e. bare soil, parking 
lots) and shadow areas and is caused by a combination of sun and satellite elevations and 
surface morphology (i.e. mountain faces). In Figure 14 building shadows, highlighted in the 
yellow ellipse, bring inaccuracies in the DSM. The use of a better initial DEM as initial 
approximation can help the matching procedures in these critical areas. If a DEM is not 
available, so-called seed points can be measured in stereo mode in the pairs and imported in 
the matching procedure as mass points. In addition, an ad-hoc radiometric processing can 
enhance details in low-textured regions and help the matching procedure. 
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Occlusions are generally present in urban areas and are due to tall buildings or trees, in 
combination with the acquisition viewing angles. In case of occlusions, corridors between 
buildings are not modeled correctly (Figure 15). To overcome this drawback, multi-angular 
acquisitions, like in Pleiades constellation, could reduce occlusions in the images. 


Objects moving during the acquisitions of the stereo images, like vehicles, lead to small 
blunders, as highlighted in the blue circles in Figure 14. They can be removed with manual 
editing or filtering. Local blunders in correspondence of special radiometric effects, like 
spilling and saturation on roof faces due to the acquisition geometry and the surface type and 
inclination (i.e. roof faces in grass), may also occur. 


Figure 13. Height step (mean value: 3.5m, black rectangle) between the DEMs obtained from two different stereo 
pairs (Dakar). 


Figure 14. Example of effects of shadows and moving objects in the DEM (Panama City). Above: pansharpened nadir 
scene; below: DEM 
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Figure 15. Example of corridor occlusion due to tall buildings; (a) and (b): stereo images; (c) resulting DEM (Dakar). 


Factor Cause/Dependency Effect in DEM Possible solution 
Poor initial absolute geo-location Viewing angles Height steps GCPs 
accuracy Processing level Poor final absolute 
Terrain shape geolocation quality 
Time interval between Large extent Height steps in Tie points 
overlapping acquisitions Swath width of VHR sensors overlapping areas measurements 
Differences in the images Moving objects (vehicles) Local blunders DEM editing 
Filtering 
Shadows Large viewing angle Mismatches Radiometric 
Sun inclination preprocessing 
Surface morphology 
Low texture areas Land covers (parking lots, bare Mismatches Radiometric 


soil, etc.) preprocessing 
Seed points 
Spilling/saturation of roof Radiometry Local blunders Masking 
Sun and satellite elevation 
Surface inclination 
Surface material 
Occlusions in urban areas Convergence angle Local blunders Seed points 


Tall buildings Wrong heights on streets Multi-angular 
acquisitions 
Height range Steep mountain Low details Seed points 
Vertical steps 
Cloud cover Whether conditions Mismatches Masking 
Water (lakes, sea) Bad quality DEM Masking 


Table 2. Summary of factors influencing the DEM quality. 
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6. DEM Fusion 


DEMs of large areas are usually generated either by SAR interferometry, laser scanning 
(mainly from airborne platforms) or photogrammetric processing of aerial and satellite images. 
As demonstrated in the previous sections, each sensing technology and processing algorithm 
(interferometry, image matching) shows its own strengths and weaknesses (see Table 3). Even 
within a single technology a highly varying DEM quality may be found. 


DEMs can be generated at different coverage levels, ranging from very local areas to near- 
global coverage. Especially taking into account digital models offering a vast land coverage, 
one can see that the accuracy, error characteristics, completeness and spatial resolution offered 
by these products can vary wildly. These products and the possibility of their joint exploitation, 
with their respective issues, are the object of the techniques introduced in this Section. 


6.1. The motivation: Complementarity 


As mentioned, each different DEM generation approach and platform of origin have their 
advantages and drawbacks. Such characteristics seem to point into an interesting direction. 
Especially if one directs its attention on a cross-platform comparison. In Table 3 some charac- 
teristics between optical and SAR DEM generation are resumed. It can be observed that the 
two technologies almost perfectly complement each other. 


Characteristic SAR Optical 
Influenced by clouds No Yes 
Influenced by atmospheric water vapour Yes No 
Influenced by sun illumination No Yes 
Performing on poorly textured areas Yes No 
Performing on edge features No Yes 
Influenced by layover effect Yes No 
Surface height estimate Yes Yes 


Table 3. Comparison between SAR and Optical sensing technologies: advantages and drawbacks. 


The complementarity between these platforms is also visible if one takes into account the 
respective DEM quality maps shown in Figure 1 for the SAR example and in Figure 16 for the 
Optical one. Note that the quality scales are the opposite between the two, since the optical 
one describes matching cross-correlation combined with other factors (the higher the better), 
while the other defines the precision in meters (the lower the better). By analysing these 
examples, one can see that where the SAR result does not excel, i.e. on edges, the optical one 
shows better performance. And vice-versa where the optical result is not as strong, i.e. poorly 
textured areas, the SAR one shows consistently better results. 
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Figure 16. Example of a matching Pléiades-1A cross-correlation map over Trento, Italy. 


As a consequence, a joint exploitation of the elevation products deriving from these platforms 
can potentially be highly profitable, since the information on which one is weak can be 
replaced/completed by the one supplied by the other. 


The second typology of data that seems a legitimate candidate for the fusion can be defined 
as intra-platform, in the sense that two products coming from the same platform may as well 
be complementary to one another. In SAR, for instance, one may have both a DEM generated 
with ascending data and one with descending data, as shown in Figure 17, evidently comple- 
menting each other. 


Figure 17. Example of intra-platform complementarity. Point cloud representation of an ascending ERS-1/2 DEM 
(left), a descending ERS-1/2 DEM (right) and the combination of the two (center). 


In Figure 17 the digital terrain models are reproduced as point clouds (i.e. not interpolated). 
This is a less biased way to represent elevation data, even if at the expense of the ease of 
visualization and manipulation. 


There are, however, constraints that directly arise in order to be able to compare and combine 
the fusion candidates. The first one, easier to solve, is dictated by the different pixel spacing. 
This issue is simply solved by oversampling the coarser DEM and its corresponding quality 
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map. The resampling does not imply that the data has acquired the same precision as the finer 
one, and, that the oversampling procedure implies a re-estimation of the data values. In the 
intra-platform case this can be avoided, as in the given example, since the products already 
possess comparable characteristics non requiring a re-estimate. 


The second constraint is to have a comparable quality index. As shown, the quality maps 
proposed in Figure 1 and Figure 16 imply different types of measures, which must be re- 
interpreted in order to acquire a cross-significance. The way these indexes can be obtained is 


given in Table 4. 
Quality Map Factors 
- baseline 
- wavelength 
SAR DEM Precision Map - interferometric coherence 


- local incidence angle 


- spatial ground resolution 


Suen i - matching parameters 
ica ccuracy Ma 
P te - slope, roughness and aspect (geomorphological criteria) 


Table 4. Quality map estimation by technique. 


In the following sections two different data representation levels, on which the fusion ap- 
proaches are based, will be defined. Example approaches will be outlined as well. 


6.2. Raster level fusion 


The primary level which is directly available for fusion is the raster level. In fact, when 
represented in raster format, DEMs define a height value (when available) for each cell location 
on a regularly spaced grid covering the whole study area, and are a suitable input for raster- 
based fusion approaches. 


By exploiting the quality estimates from the previous section, the easiest and sometimes more 
appropriate method to fuse two raster data is the weighted combination of the two observa- 
tions, namely, a weighted average. This process is executed between DEMs covering the same 
surface and having the same pixel spacing, resulting in a very fast cell-to-cell calculation. The 
simplicity of the approach, however, may badly influence the results. The most important 
aspect to keep in mind when fusing two different datasets is their accuracy with respect to 
their spatial resolution. When the latter greatly differs between the two, their combination may 
result in an over-smoothing effect as well as single pixel aberrations and abrupt discontinuities 
(which in turn represent over-fitting). This results in a major degradation of the information 
contained in the datasets. When over-smoothing, the quality of the product with higher spatial 
resolution will be degraded, while when over-fitting the spatial characteristics will be ex- 
tremely exaggerated/accentuated. As such, this kind of processing better suits products having 
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the same pixel spacing from the start, rather than oversampled, and shall be avoided if the 
original pixel spacing difference is too large. 


More advanced methods have been proposed to overcome the issues due to an over simplistic 
approach. These techniques keep into account the values and statistics not only in the single 
pixels but also in their neighbours, by exploiting statistics windows, extending the process to 
a less local estimate and thus ensuring a spatial consistency of the results. This type of 
approaches are conceptually less sensitive to the over-smoothing and over-fitting effects, since 
the spatial characteristics of both input data are taken into account. In general, during the 
fusion process, a higher priority is given to the input with finer spatial resolution. These 
approaches are more desirable than the first one based on weighted average, since they are 
less prone to produce outputs greatly inferior to the inputs and, moreover, they are less 
influenced in case of oversampling, since the window on the finer input will have a higher 
weight. The difference in pixel spacing should ideally be kept quite small even in this case. 


One example of approach following these assumptions can be found in [75], where also 
additional information is used as input. The latter is constituted of a database (dictionary) of 
small DEM patches collected over existing DEMs showing similar characteristics to the ones 
to be fused. Sparse representations are used to solve the fusion problem. In this approach 
Orthogonal Matching Pursuit is used to identify the most suitable input patches to be used in 
weighted linear combination defining the output. An example of raster level fusion product 
is given in Figure 18. 


Figure 18. Example of raster based DEM fusion (center) between a SPOT-5 product (right) an ALOS-PALSAR-1 product 
(left). 


6.3. Point cloud level fusion 


The second product level available for data fusion is the Point Cloud level. Fusing point clouds, 
instead of rasterized surfaces, comes natural if one takes into account how Interferometry and 
Image Matching work, i.e. in a point-wise fashion. This allows to lessen an effect that occurs 
during the rasterization step of the outputs, i.e. error propagation. By avoiding rasterization, 
the error value computed in the height estimation process is preserved as is, and corresponds 
to the input data alone, thus preserving the quality of each observation. 
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The approaches based on point cloud level, however, show a greater complexity. Firstly, it is 
a kind of data which is conceptually more abstract to handle than the raster one. While the 
latter can easily be associated to matrices, the former are vectorial point objects in three 
dimensions (see Figure 19). This also implies a conceptual complexity of the approach to design 
for their fusion. The second aspect to take into account is the data size and point density. In 
case of SAR DEMs covering very large areas, for example, the study area may show very high 
coherence over the whole scene, implying a large amount of points with very large height 
variability over a restrained two dimensional (coordinates) space. Finally, with respect to 
raster-based approaches, more complex procedures must be designed for the inclusion of the 
information provided by the quality measures into the process. 


Figure 19. Example of image matching results, shown as point cloud, obtained from Pléiades-1A imagery over the city 
of Trento. 


A possible solution to the overabundance of data is a preliminary sample selection/substitution 
phase, in which the information about the data can be exploited. In SAR-SAR fusion, precision 
can be used to drive the choice between redundant points. In SAR-Optical fusion, the instru- 
ment behaviour resumed in Table 3 can be exploited as well. An ideal approach would for 
instance prefer measurements coming from Optical edge matching rather than its interfero- 
metric counterpart, vice-versa points produced by matching a regular grid over poorly 
textured areas should be substituted by the interferometric ones. Note however that slope also 
remains a key factor to keep into account (see Table 3). 


During the image matching process it is possible to identify which one is which, and this 
information should be taken into account and stored, as shown in Figure 20. The selection 
phase can also be used as an initial combination phase, computing local statistics such as the 
slope between points, in order to eliminate outliers and substitute the observations according 
to their quality measure and characteristics. 
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Figure 20. Breakdown of a subset of Figure 19 between: grid-based matching points (left), feature-based matching 
points (center), edge-and area-based matching points (right). 


The second step to accomplish is the generation of the fused DEM itself, hence the definition 
of an algorithm capable to handle such data and produce the final model. Interpolators can be 
applied to estimate the final model from the newly defined point cloud. In this perspective, 
two main types of interpolators can be considered. The first one are the so-called exact 
interpolators [76], according to which the output estimating function must pass by fixed points 
(input observations). The second interpolators are the confidence-interval aware interpolators, 
for which the output function should pass between an “envelope” from the points at which 
the interpolant is fitted. This concept is shown in Figure 21. In the left image the output function 
is fitted to well selected samples. In the centre image, some samples showing a high error and 
outlying the correct estimate were not filtered, resulting in an estimate function (solid line) not 
reproducing the correct function (dashed line). In the right image, the outliers are still present, 
but thanks to the error aware interpolator, the function would ideally be better or correctly 
estimated. Note that ideally this error envelope should be user defined, exploiting the a priori 
quality knowledge offered by SAR Precision, Optical cross-correlation, slope and feature type, 
for each input point. 


Figure 21. Example of a function estimated from an exact interpolator (left), from an exact interpolator with outliers 
(center) and from an error aware interpolator (right). The solid line is the estimate function, the dashed line the correct 
function. Green dots are good selected samples, red dots bad selected samples, vertical lines define the error enve- 
lope. 


The first family of interpolators include, for instance, the Radial Basis Function [77], which is 
widely recognized as reliable estimator. This kind of interpolators, however, does not take into 
account the quality index of each point, but, after a well-executed sample selection step, the 
fusion can be performed. Ideally, interpolators taking into account the confidence interval are 
better suited, especially if they offer the possibility to specify the confidence range at each data 
point while computing the interpolant. 
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The size problem, however, still remains. By using advanced interpolators, the results improve 
greatly in accuracy, but increase in computational cost, memory cost and complexity. A way 
to avoid these problems is to implement the RBF following a structure similar to the one of the 
Shepard Interpolator, as proposed in [78]. The process is resumed in Figure 22. 


Input Data Sample selection Compute Interpolant interpolate output 


. For each point use the 


e nheight e Quality and e Define m neighbourhoods 


estimates feature based of k points for each input interpolants in whose radius 


+ quality selection of m Train interpolant on each of influence it falls 


e Compute output from I results 


measures points (m < n) neighbourhood 


Figure 22. Diagram summarizing the steps of the proposed fusion approach. 


This approach also leaves the possibility to decide the regionality and distance influence of the 
interpolator, allowing for a fine tuning and a good trade-off between over-smoothing and over- 
fitting. Being very dense datasets, the possibility to control how local the interpolant should 
be is very important. An example of DEMs fused with this approach is shown in Figure 23. 


Figure 23. Examples of an ascending ERS-1/2 DEM (bottom-left), descending ERS-1/2 DEM (bottom-right), DEM fu- 
sion results (bottom-center) and the full DEM resulting from the fusion (top), over a region in Turkey. 
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The second family of interpolators is composed by more advanced techniques, which are 
currently used in a wide variety of study fields, as well as environmental studies. See [76] to 
have an extensive collection of such approaches. When taking into consideration these 
approaches, however, the very large size of the datasets becomes even more an issue. The 
computational and memory costs may well become prohibitive. A possible solution could be 
to apply a concept similar to the one introduced for Radial Basis Functions, reducing consid- 
erably the issues. 


7. Conclusions 


In this chapter, the SAR and Optical techniques applied to produce DEMs were introduced, 
leading to the topic of elevation data fusion. The main reasons strengthening the desirability 
of such an approach, especially at an inter-platform level, have been outlined. The fusion topic 
is still, however, an ongoing issue which has to be further investigated. The interest in 
developing this topic may even grow in the future. Especially considering the pace at which 
new instruments capable to extract elevation information are introduced, bringing with them 
new issues and shortcomings which may well be overcome by exploiting their complemen- 
tarities. The production of worldwide “absolute” single-date single-sensor elevation models 
still remains utopic, even for new missions such as TanDEM-X, Sentinel or Pléiades. The 
shortcomings of the instruments will be lessened, their spatial resolution improved, but their 
intrinsic characteristics will remain, along with their complementarity and therefore the 
interest in their fusion. Temporal resolution is also an important aspect, since throughout the 
year the ground’s physical characteristics can induce highly varying height estimates. The 
fusion of multi-date elevation models would allow to obtain products with a much more 
reliable terrain reproduction ability. Each fusion processing level has its advantages, the raster 
level surely is easier to handle than the point cloud one, while the latter offers a less biased 
data reproduction. For the sake of accuracy and reliability, the point cloud level is undoubtedly 
the better option on which to base future approaches. The dataset size-related issues opposing 
these approaches can be overcome by improving sample selection or by improving the 
interpolant computation step, greatly increasing the speed and manageability of the ap- 
proaches. Finally, approaches giving the ability to finely exploit the data intrinsic information 
to guide the fusion should be investigated. 
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